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Abstract 

In Compressive Sensing, the Restricted Isometry Property (RIP) ensures that robust recov- 
ery of sparse vectors is possible from noisy, undersampled measurements via computationally 
tractable algorithms. It is by now well-known that Gaussian (or, more generally, sub-Gaussian) 
random matrices satisfy the RIP under certain conditions on the number of measurements. 
Their use can be limited in practice, however, due to storage limitations, computational con- 
siderations, or the mismatch of such matrices with certain measurement architectures. These 
issues have recently motivated considerable effort towards studying the RIP for structured ran- 
dom matrices. In this paper, we study the RIP for block diagonal measurement matrices where 
each block on the main diagonal is itself a sub-Gaussian random matrix. Our main result states 
that such matrices can indeed satisfy the RIP but that the requisite number of measurements 
depends on certain properties of the basis in which the signals are sparse. In the best case, these 
matrices perform nearly as well as dense Gaussian random matrices, despite having many fewer 
nonzero entries. 

Keywords — Compressive Sensing, Block Diagonal Matrices, Restricted Isometry Property 

1 Introduction 

Many interesting classes of signals have a low-dimensional geometric structure that can be exploited 
to design efficient signal acquisition and recovery methods. The emerging field of Compressive 
Sensing (CS) deals with signals that can be parsimoniously expressed in a basis or a dictionary. 
A canonical result in CS states that sparse signals, i.e., signals with very few nonzero entries, 
can be accurately recovered from a small number of linear measurements by solving a tractable 
convex optimization problem if the measurement system satisfies the Restricted Isometry Property 
(RIP) 0]. 

The RIP requires the linear measurement system to approximately maintain the distance be- 
tween any pair of sparse signals in the measurement space, implying that the geometry of the 
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family of sparse signals is approximately preserved in the measurement space. Apart from playing 
a central role in the analysis of numerous signal recovery algorithms in CS [U \19\ ITT] , the RIP 
also provides a framework to analyze signal processing and inference algorithms in the compressed 
measurement domain |10j . Moreover, measurement systems that satisfy the RIP, after undergoing 
some minor modifications, can approximately preserve the geometry of an arbitrary point cloud (as 
confirmed by the Johnson-Lindenstrauss lemma) [17 1 or a low-dimensional compact manifold |33j. 

A measurement system represented by a matrix populated with i.i.d. sub-Gaussiarj]] random 
variables is known to satisfy the RIP with high probability whenever the number of rows scales 
linearly with the sparsity of the signal and logarithmically with the length of the signal [I] . Such 
matrices are also universal in that, with the same number of random measurements, they satisfy 
the RIP with respect to any fixed sparsity basis with high probability. We refer to such matrices — 
densely populated with i.i.d. random entries — as unstructured measurement matrices. There has 
been significant recent interest in studying structured measurement systems because unstructured 
random measurements may be undesirable due to memory limitations, computational costs, or spe- 
cific constraints in the data acquisition architecture. Many structured systems have been studied 
in the CS literature, including subsampled bounded orthonormal systems |24} 121], random convo- 
lution systems (described by partial Toeplitz [T-J] and circulant matrices \'2'2l [TE] ) and deterministic 
matrix constructions [12J. Generally, structured random matrices require more measurements to 
satisfy the RIP and lack the universality of unstructured random matrices. 

In this paper, we are concerned with establishing the RIP for block diagonal matrices popu- 
lated with i.i.d. sub-Gaussian random variables. The advantages of such matrices are varied. First, 
these matrices require less memory and computational resources than their unstructured counter- 
parts. Second, they are particularly useful for representing acquisition systems with architectural 
constraints that prevent global data aggregation. For example, this type of architecture arises in 
distributed sensing systems where communication and environmental constraints limit the depen- 
dence of each sensor to only a subset of the data and in streaming applications where signals have 
data rates that necessitate operating on local signal blocks rather than on the entire signal simul- 
taneously. In these scenarios, the data may be divided naturally into discrete blocks, with each 
block acquired via a local measurement operator. 

To make things concrete, for some positive integers J, N, and M, set M := JM and N := JN . 
We model a signal x G as being partitioned into J blocks of length N , i.e., x = [xj, • • • , Sj] 
where Xj G C^, j G [J). Here, [J] denotes the set {1, 2, • • • , J}. As an example, x can be a video 
sequence and {xj}, j G [J], can be the individual frames in the video. For each j G [J], we suppose 
that a linear operator $j : — > C AI collects the measurements yj = &jXj. In our example, 
this means that each video frame Xj is measured with an operator <3?j. Concatenating all of the 

measurements into a vector y G C M , we then have 
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1 Roughly speaking, the tail of a sub-Gaussian random variable is similar to that of a Gaussian random variable. 
This term is defined precisely in Section |2| 
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Thus we see that the overall measurement operator relating y to x will have a block diagonal 
structure. In this paper we consider the two scenarios. When the {&j} are distinct, in which 
case we call the resulting matrix a Distinct Block Diagonal (DBD) matrix. When the {&j} are 
all identical, in which case we call the resulting matrix a Repeated Block Diagonal (RBD) matrix. 
Our results show that whenever the total number of measurements M is sufficiently large, DBD 
and RBD matrices can both satisfy the RIP. As we summarize in Sections 1.2 and 1.3 below, the 
requisite number of measurements depends on the type of matrix (DBD or RBD) and on the basis in 
which x has a sparse expansion. We also show that certain sparse matrices and random convolution 
systems considered in the CS literature can be studied in the framework of block diagonal matrices. 

In general, proving the RIP for structured measurement systems requires analytic tools beyond 
the elementary approaches that suffice for unstructured matrices. For example, in [21] the authors 
employed tools such as Dudley's inequality from the theory of probabilities in Banach spaces, and 
in [22] a variant of Dudley's inequality for chaos random processes was used to obtain a result 
that was out of reach for elementary approaches. While some of the ideas and techniques utilized 
in [21] and |22] can be used to establish the RIP for random block diagonal matrices (see [31] for 
our preliminary study), even these sophisticated tools result in measurement rates that are worse 
than what we report in this paper. Fortunately, recent work by Krahmer et al. has established an 
improved bound on the suprema of chaos random processes that enabled them to prove the RIP 
for Toeplitz matrices with an optimal number of measurements [16] ■ The bound in [16] is very 
general, and we have leveraged this result for the main results of this paper. Specifically, the work 
in [16] has allowed us to develop a unified treatment of DBD and RBD matrices with bounds on 
the measurement rates that are significantly improved over our preliminary work. 



1.1 Definition of the RIP 

A linear measurement operator satisfies the RIP if it acts as an approximate isometry on all 
sufficiently sparse signals. More specifically, the Restricted Isometry Constant (RIC) of a matrix 
A € M MxN is defined as the smallest positive number 5s for which 

(I - 5 s )\\x\\l < \\Ax\\l < (1 + 5 s )\\x\\l for all x with ]|z[| < S, (2) 

where || • ||o merely counts the number of nonzero entries of a vector. In many applications, however, 
signals may be sparse in an orthobasis U other than the canonical basis, and so we will find the 
notion of the {/-RIP more convenient. 

Definition 1. Let U denote an orthobasis for C . The RIC of a matrix A £ R MxiV in the basis 
U, 5s = 5s(A, U), is defined as the smallest positive number for which 

{1 - 5 s )\\x\\l < \\Ax\\l < (I + 5 s )\\x\\l for all x with \\U*x\\ < 5, (3) 

where U* denotes the conjugate transpose of U . 

In general, whenever the sparsity basis is clear from the context, U is dropped from our notation 
(after its first appearance) . More generally, the notion of the RIP could be extended to the class of 
signals that are sparse in an overcomplete dictionary [6]. Nonetheless, for the clarity of exposition, 
we restrict ourselves throughout this paper to considering only orthobases. 
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1.2 The RIP for Distinct Block Diagonal (DBD) Matrices 

Suppose the matrices {&j}, j G [J], in are distinct, and let ^ denote the resulting block diagonal 
matrix. Following [20], we say that has a DBD structure. DBD matrices arise naturally when 
modeling the information captured by individual (and different) sensors in a sensor network. In this 
setting, yj = QjXj represents the local measurements of the signal x made by jth sensor and thus 
y = *$>x represents the total measurements of x captured by the whole network. Or, as mentioned 
previously, DBD matrices can be used to represent the process of measuring a video sequence 
frame-by-frame, but where each frame is observed using a different measurement matrix. DBD 
matrices also arise in the study of observability matrices in certain linear dynamical systems [29J, 
and as another example, DBD matrices can be used as a simplified model for the visual pathway 
(because the information captured by the photoreceptors in the retina is aggregated locally by 
horizontal and bipolar cells |15j). Due to their structure, DBD matrices can be transformed into 
sparse measurement matrices after permutation of their rows and columns [I] . 

We have previously derived concentration inequalities for DBD matrices populated with sub- 
Gaussian random variables |28|I20|. Rather than ensuring the stable embedding of an entire family 
of sparse signals, these equalities concern the probability that a bound such as Q will hold for 
a single, arbitrary (not necessarily sparse) signal x. We have shown that, unlike the case for 
unstructured random matrices, the probability of concentration with DBD matrices is actually 
signal dependent, and in particular the concentration probability depends on the allocation of the 
signal energy among the signal blocks. However, for signals whose energy is nearly uniformly 
spread across the J blocks (this happens, for example, with signals that are sparse in the Fourier 
domain [20]), the highly structured DBD matrices can provide concentration performance that is 
on par with the unstructured matrices often used in CS. 

While concentration of measure inequalities are useful for applications concerning compressive 
signal processing [10J , it is not evident how such a concentration result can be extended to give an 
RIP bound as strong as the one in this paper. Specifically, in Section 3^2 of this paper, we show 
that if the total number of measurements M scales linearly with the sparsity of the signal S and 
poly-logarithmically with the ambient dimension N, DBD matrices populated with sub-Gaussian 
random entries will satisfy the [/-RIP with high probability. In addition to a dependence on S 
and N, however, our measurement bounds also reveal a dependence on a property known as the 
coherence of the sparsifying basis U. In this sense, the signal-dependent nature of our concentration 
of measure inequalities carries over to our RIP analysis for DBD matrices. (The fine details of how 
this occurs, however, are different.) Our study does confirm that for the class of signals that 
are sparse in the frequency domain, DBD matrices satisfy the RIP with approximately the same 
number of rows required in an unstructured Gaussian random matrix (despite having many fewer 
nonzero entries). 



1.3 The RIP for Repeated Block Diagonal (RBD) Matrices 

Alternatively, suppose the matrices j G [J], in ([I]) are all equal, and let H denote the resulting 

block diagonal matrix. Following |20j, we say that S has an RBD structure. In the context of the 
sensor network, video processing, and observability applications discussed before, RBD matrices 
arise when the same measurement matrix is used for all the signal blocks. In the delay embedding 
of dynamical systems, as another example, a time series is obtained by repeatedly applying a 
scalar measurement function to the trajectory of a dynamical system. This time series can then 
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be embedded in a low-dimensional space (hence the name), and this embedding can be expressed 
through an RBD measurement matrix provided that the scalar measurement function is linear [32j . 
Though not obvious at first glance, RBD matrices also have structural similarities with random 
convolution matrices found in the CS literature 1221 1161. We revisit this connection in order to 



re-derive the RIP for partially circulant random matrices in Section 3.3.2 



We have previously derived concentration inequalities for RBD matrices populated with Gaus- 
sian random variables [281 1231 [20] . Our bounds for these matrices again reveal that the probability 
of concentration is signal dependent. However, in this case, our bounds depend on both the alloca- 
tion of the signal energy among the signal blocks as well as the mutual orthogonality of the signal 
blocks. 

In Section 



3.3 



of this paper, we show that if the total number of measurements M scales linearly 
with S and poly-logarithmically with N, RBD matrices populated with sub-Gaussian random 
variables will satisfy the C-RIP with high probability. Our measurement bounds also reveal a 
dependence on a property known as the block-coherence of the sparsifying basis U that quantifies 
the dependence between its row blocks. When the block-coherence of U is small, RBD matrices 
perform favorably compared to unstructured Gaussian random matrices. Most sparsifying bases 
are in fact favorable in this regard; we prove that the block-coherence of U is small when U is 
selected randomly. Once again, for RBD matrices, the signal dependent nature of concentration 
inequalities and the dependence of the RIP on the sparsifying basis emerge as two sides of the same 
coin. 



1.4 Outline 

This paper is organized as follows. Section [2] introduces the notation used throughout the rest of 
the paper. Section [3] summarizes our main results regarding the RIP for DBD and RBD matrices; 
these results are later proved in Section [5} Section [4] presents numerical simulations that illustrate 
the dependence of signal recovery performance on the sparsifying basis U. We conclude the paper 
with a short discussion in Section [6j We note that, for the reader's convenience, the Toolbox ([A]) 
gathers some general tools from linear algebra and probability theory used in our analysis. 



2 Notation 

We reserve the letters C,Ci,C2, ■ ■ ■ to represent universal positive constants. We adopt the fol- 
lowing (semi-)order: a < b means that there is an absolute constant C\ such that a < C±b. If 
the constant depends on some parameter c, we write a < c b. Also a > b and a > c b are defined 
similarly. 

For an integer S, a signal with no more than 5* nonzero entries is called S-sparse, and S is 
known as the sparsity level. In particular, ||a||o denotes the number of nonzero entries of a vector 
a. More generally, a signal that is a linear combination of at most S columns of a basis is said to be 
5-sparse in that basis. The conjugate transpose of a matrix A will be denoted by A* . In this paper, 
Rank (^4) stands for the rank of matrix A. In addition to the regular £ p -norms in the Euclidean 
spaces, 1 < p < oo, we use ||A||2 and \\A\\p to denote the spectral and Frobenious norms of a matrix 
A, respectively. We use ||A|| max to denote the largest entry of the matrix A in magnitude. For 
1 < p < oo, the Schatten norm of order p of a matrix A is denoted by ||A||5 and is defined as 

Hulls'? := \Wa\\ p , 
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where 17,4 is the vector formed by the singular values of A. Observe that H^Hs^ = \\A\\2 and 
ll^lls^ = II-^IIf- Throughout this paper, for a matrix A, vec(A) returns the vector formed by 
stacking the columns of A. Also, we will use the conventions [N] := {1, 2, • • • , N} (for an integer 
N) and #T for the cardinality of a set T. 

When it appears, the subscript of an expectation operator E specifies the (group of) random 
variable(s) with respect to which the expectation is taken. For a random variable Z taking values in 
C, we define E P |Z| := (E|Z| p ) 1 / p , p > 1. A random variable Z is sub-Gaussian if its sub-Gaussian 
norm, defined below, is finite [2?] : 

\\Z\\^ 2 := sup -^-W\Z\. (4) 

P >i Vp 

Qualitatively speaking, the tail of (the distribution of) a sub-Gaussian random variable is similar to 
that of a Gaussian random variable, hence the name. Finally, a Rademacher sequence is a sequence 
of i.i.d. random variables that take the values ±1 with equal probability (and are independent of 
everything else in their every appearance in this paper). In this paper, = means that the random 

i.d. 

variables on both sides of the equality have the same distribution. 

A set C (5, || • ||, r) is called a cover for the set S at resolution r and with respect to the metric 
|| • || if for every x S S, there exists x' S C (5, || • ||,r) such that \\x — x'\\ < r. The minimum 
cardinality of all such covers is called the covering number of S at resolution r and with respect to 
the norm || • ||, and is denoted here by M{S, \\ ■ \\,r). 



3 Main Results 

3.1 Measures of Coherence 

Our results for random block diagonal matrices depend on certain properties of the sparsity basis, 
i.e., the basis in which the signals have a sparse expansion. These properties are defined and studied 



in this section; this sets the stage for a detailed statement of our main results in Sections 3.2 and 3.3 



3.1.1 Coherence Definitions 

The coherence of an orthobasis U G <C NxN is defined as follows [7j: 

KU) ■= max \U(p,q)\, (5) 

P,qe[N] 

where U(p,q) is the (p, q)th entry of U. If {n^} and {e^}, n G [N], denote the columns of U and 
of the canonical basis for C^, respectively, one can easily verify that 

H(U) = VN max |(«p,e,)| . (6) 
P,<?e[7V] 

This allows us to interpret n(U) as the similarity between U and the canonical basis. 

A few more definitions are in order before we can define the second important property of a basis 
used in this paper. For a £ C N , set x(a) = x(a,U) := Ua, and define Xj(a) = Xj(a,U) G C^, 
j £ [J], such that 

x(a) = [xi(a) T , x 2 (a) T , • • • , xj(a) T ] T . (7) 
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If we also define Uj G C NxN , j G [J], such that 

U=[U?,Ul--- ,Uj] T , (8) 
we observe that Xj(a) = Uja for every j. Define X R (a,U) G C 7VxJ as 

X^(a) = X R (a, U) := [ x\(a) x 2 (a) ••• ccj(oi) ] = [ U\a ••• C/ja ] . 
Now the block- coherence of U, denoted by "f(U), is defined as 

7 (t7):=v^max||X ii (eK,t7)|| 2 . (9) 

ne[N] 

In words, 7(C) is proportional to the maximal spectral norm when any column of U is reshaped 
into an N x J matrix. In analogy with one can also think of @ as a (non-commutative) 
coherence measure between U and Ifr^ Qualitatively speaking, j(U) measures the orthogonality 
and distribution of energy between the row-blocks of U. If for every column of U, the energy is 
evenly distributed between its row-blocks and they are nearly orthogonal, j(U) will be small and, as 
we will see later, better suited for our purposes. In the next subsection, we compute the coherence 
and block-coherence of a few widely- used orthobases. 



3.1.2 Computing the Coherence for a Few Orthonormal Bases 

It is easily verified that 

1 < n(V) < VR. (10) 

The upper bound is achieved, for example, by the canonical basis in C^, i.e., fJ>(Ift) = viV. The 
lower bound, on the other hand, is achieved by any basis that is maximally incoherent with the 
canonical basis. For example, n(F^) = 1, where F^ denotes the Fourier basis in C . The next 
lemma, proved in [Bj indicates that most orthobases are also highly incoherent with the canonical 
basis. 

Lemma 1. Let R G R NxN denote a generic orthobasis in C N chosen randomly from the uniform 
distribution on the orthogonal group. Then the following holds for fixed t>l and N > i 2 log N: 

pL(R)>ty/]^\<N- t . (11) 

We now turn to computing the block-coherence of the same orthobases. Since every column of 
U has unit ^-norm, it is easily observed that 

1 < l{U) < \[J. (12) 

Consider the canonical basis in C . For every n G [N], X(e^,/jy) has a single non-zero entry, 
which equals 1, and thus ||X(e^, ijy)||2 = 1- Hence, y(Ifi) = vJ. Moving on to F^, we observe 
that the entries of the first column of F^ equal iV -1 / 2 . As a result, the entries of X(ei,F^) all 
equal N' 1 / 2 . It follows that ||X(ei, F~)|| 2 = 1 and therefore y(F~) = \/~J ■ 

2 It can be easily verified that, in general, maxs ||Xfl(en, U)\\2 7^ maxs ij^)||2, where u„ is the nth column 

off/. 
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Because the canonical basis and the Fourier basis — which one might naturally consider to be 
opposite ends on some spectrum of orthobases — both have large block-coherence, one might wonder 
whether any orthobasis could have small block-coherence. As we will see, most possible orthobases 
actually do have small block-coherence. For example, consider the generic orthobasis R constructed 
in Lemma [l] The columns of X(e\,R) are J random vectors in M. N . These vectors are weakly 
dependent because the first column of R has unit ^-norm. With high probability, the length 
of each vector is approximately \j\f~J (so that the ^-norm of the first column of R is one). If 
J < N, then with high probability these points are spread out inR^ so that ||X(ei,-R)|| 2 « 1/VJ. 
Now, since the columns of R have the same distribution, [|X(e^, -R)||2 ~ for every n G [N]. 

Therefore, 7(-R) ~ 1, which is much smaller than the block-coherence of the canonical and Fourier 
bases. The next result, proved in[Cj formalizes this discussion. 

Lemma 2. Consider the generic orthobasis R constructed in Lemma [7J For fixed t < 1, the 
following holds if J < N and N > t~ 2 log N: 

pLwzi+JZ+tiZN-*- ( i3 ) 



We close this section by noting that Section 3.3.2 provides an example of a deterministic basis 



with small block-coherence. (This is then used to prove the RIP for partial random circulant 
matrices.) 

3.2 The RIP for DBD Matrices 



Let $ G R MxiV denote a matrix populated with i.i.d. sub-Gaussian random variables having mean 
zero, standard deviation 1/yM, and sub-Gaussian norm t/VM, for some r > 0. Take {<&•/} in ([!]) 
to be J independent copies of <3? and let Vl/ G M. MxN denote the resulting block diagonal matrix 
m 0. Our first main result, proved in Section [EJ establishes the RIP for DBD matrices with this 
construction. 



Theorem 1. Let U denote an orthobasis for C N and define J1(U) := min ^V^, fJ,(U)j. If S > 1 
and 

M> T 5- 2 J1 2 (U)-S -log 2 Slog 2 N, (14) 
then Ss(^f, U) < 5 < 1, except with a probability of at most 0(N~ log Nlo & 2 s '). 

A few remarks are in order. The requisite number of measurements is linear in the sparsity 
level S and (poly-) logarithmic in the ambient dimension N, on par with an unstructured random 
Gaussian matrix More importantly, the requisite number of measurements scales with Ji 2 (U) 
which takes a value in the interval [1, J]. For the Fourier basis, we calculated that ^{F^) = 1. 
Therefore, when measuring signals that are sparse in the frequency domain, we observe that a 
DBD matrix compares favorably to an unstructured Gaussian matrix of the same size. This is in 
the sense that they both require the same number of measurements to achieve the RIP (up to a 
poly- logarithmic factor) . 

On the other hand, when the orthobasis U is highly coherent with the canonical basis, the 
requisite number of measurements is proportional to SJ (instead of S). While possibly unfavorable, 
this is indeed necessary (to within a poly-logarithmic factor) to achieve the RIP in some cases. For 
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example, recall that £t(ijy) = V N and so pt(ijy) = To see why the results are optimal in this 
case, consider the class of S*-sparse signals in 1^ whose nonzero entries are located within the first 
length- N block of the signal. Achieving a stable embedding of this class of signals requires <J>i itself 
to satisfy the RIP. This matrix, $1, is an unstructured sub-Gaussian matrix, and ensuring that it 
satisfies the RIP requires M > T 5~ 2 Slog(N/S) [I]. Consequently, ensuring the I^-RIP for ^ is 
only possible when M > r 5~ 2 JSlog(N/S), as predicted by Theorem [l] (up to a poly- logarithmic 
term). The required number of measurements in this case can still be parsimonious (M <C N), 
however, if the sparsity level S of the signal x is much less than N, the length of each signal block 

Xj. 

As a final note, Theorem [T] implies the RIP for a certain class of sparse matrices which are of 
potential interest in their own right OH]. 

Corollary 1. Let U denote an orthobasis for and define J1(U) := min (^/J, /i(U)\. Let 

denote the (sparse) matrix obtained by an arbitrary permutation of the rows and columns of ^ . If 
S > 1 and 

M > T 8- 2 J1 2 {U)-S -log 2 Slog 2 N, (15) 
then U) < 5 < 1, except with a probability of at most 0(N- lo ^^ lo s 2 Sy 

Proof. Without loss of generality, consider no permutation in the rows and let P c G R N denote 
the permutation matrix for the columns of Since P C U has the same coherence as U, the claim 
follows by applying Theorem [TJ □ 

3.3 The RIP for RBD Matrices 
3.3.1 Main Result for RBD Matrices 

Let $ G M. MxN denote a matrix populated with i.i.d. sub-Gaussian random variables having mean 
zero, standard deviation 1/y/M, and sub-Gaussian norm t/\^M, for some r > 0. Take $>j = $ 
for every j G [J] in ([!]) and let S G R MxAr denote the resulting block diagonal matrix in Q. 
Our second main result, also proved in Section [5j establishes the RIP for RBD matrices with this 
construction. 

Theorem 2. Let U denote an orthobasis for C . If S > 1 and 

M>r S- 2 j 2 (U)-S -log 2 Slog 2 N, 

then <5s(H, U) < 5 < 1, except with a probability of at most 0(N~ logNlog s ). 

A few remarks are in order. At one end of the spectrum, the block-coherence of an orthobasis 
could equal y/j and consequently the required number of measurements above would scale with JS. 
This happens for signals that are sparse, for example, in the time (canonical basis) or frequency 
domains. Our result is indeed optimal for both of these bases (up to a poly-logarithmic factor). The 
same argument for the canonical basis carries over from the DBD matrices. For the Fourier basis, 
we note that it is possible to construct certain classes of periodic signals in that would require 
the lower bound on M to scale with JS. Consider, for example, the class of signals consisting of 
all S-sparse combinations of columns 1, J + 1, . . . , ( J — 1)N + 1 from Ff^. If x belongs to this class, 
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then, by construction, x%,X2, ■■■ ,xj (as denned in ([7])) are all equal because x is periodic with a 
period of N. As a result, different blocks of H take the same measurements from x. Therefore, as 
was the case with the DBD matrices, obtaining a stable embedding of this class of signals requires 
M > T 5~ 2 S\og(N/S), and equivalently, M > T 5~ 2 JS\og{N/S). 

At the other end of the spectrum, for a generic orthobasis R we computed that j(R) < 1 
with high probability. For signals that are sparse in this basis (and therefore for many possible 
orthobases in general) , an RBD matrix performs nearly as well as an unstructured Gaussian random 
matrix. The RBD structure allows us to prove the RIP for certain classes of structured random 
matrices as a special case of Theorem [2} In particular, in the next subsection we re-derive an 
RIP bound for partial random circulant matrices that originally appeared in [16j . As a byproduct, 
we also construct a deterministic sparsity basis that achieves a performance similar to the generic 
orthobasis we have considered above. 



3.3.2 The RIP for Partial Random Circulant Matrices 



This section demonstrates that the RBD model, together with Theorem [2j can be used to derive 
the RIP for partial random circulant matrices^ More specifically, we focus on proving the RIP for 
r G R JxP , with J < P, defined as 



r 



i 



ep 



(2 

ei 



ep-J+2 ep-j+3 



ep 
ep-i 

ep-J+i 



where {e p }, p £ [P], is a sequence of i.i.d. zero-mean, unit- variance random variables with sub- 
Gaussian norm r. We let e denote the vector formed by {e p }. In order to use Theorem [2] in 
this setting, we make the following argument: for any signal x 6 C p , we can write Tx as the 



multiplication of an RBD matrix S £ 



tJxPJ 



and an extended vector x 6 C PJ : 



Tx 



e* 






' S°x - 


e* 




1 


S x x 






'73 






e* 




S jll x 



(16) 



where S is the cyclic shift- up operator on column vectors in C p . The next lemma (proved in |P[) 
states that if x is sparse, then x has a sparse representation in a favorable orthobasis T (constructed 
in the proof). 

Lemma 3. There exists an orthobasis T with 7(T) = 1, such that every x has an S-sparse repre- 
sentation in T if the corresponding x is S-sparse. That is, for every such x, there exists x e with 
\\%e\\o < S such that x/V~J = Tx e . 

3 The arguments in this section extend without much effort to the more general case of establishing the RIP for 
partial random Toeplitz matrices. 
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Therefore, the T-RIP for E implies the RIP for T. To be more specific, after setting M = 1, 
Theorem [2] implies that Ss(T) < Ss(^, T) < 5 except with a probability of at most 0(P~ lo s pio s s ) ; 
provided that 



J>r S- 2 - 5 log 2 Slog 2 P, 



which is equivalent to Theorem 1.1 in [16] . 



4 Numerical Simulations 

This section contains a series of simulations that are intended to reinforce our findings for the 
reader. An ideal scenario would involve generating several random block diagonal matrices while 
varying the sparsity level S and the number of measurements M and measuring the fraction of 
realizations in which the RIC falls below a fixed threshold. Checking the RIP for a matrix is, 
however, known to be an NP hard problem |26| ; this encourages us to examine a proxy for the RIP. 
As discussed in Section |1.1[ a major application of the RIP is to ensure robust recovery of sparse 
signals via algorithms such as Basis Pursuit (BP) Q Similar to [13], then, we measure the success 
of sparse signal recovery as an alternative to verifying the RIP. This is detailed in the following. 

With N = 100 and J = 10 (and consequently, N = 1000), we generate an N x N DBD 
matrix whose entries are i.i.d. Gaussian random variables having zero mean and unit standard 
deviation. Note that each diagonal block is of size N x N. For each pair (S,M) £ [M] x [N], the 
following procedure is executed. An MxN DBD matrix ^ is formed by keeping (and appropriately 
normalizing) the first M rows of each diagonal block of the large N x N matrix. Then 20 random 
5-sparse signals in the canonical basis of are generated. These sparse signals are measured 
using \& and reconstructed back (from incomplete measurements) via BPj^jWe deem the recovery 
successful if the relative £2 reconstruction error is less than the fixed threshold 10 -2 . The fraction of 
successful recovery is recorded and this procedure is repeated for other pairs (S,M). The resulting 
phase transition graph is depicted in Figure [Ta] where the color of each pixel ranges from black for 
perfect recovery in every realization to white for failed recovery every time. 

We repeat the above simulation for signals that are sparse in the Fourier and generic bases 
(see Lemma [I]). A new generic basis is generated in each iteration. All of the above simulations 
are then repeated with an RBD matrix. The results of the simulations are displayed in Figures [l] 
and [2] In the simulations with DBD matrices, recovery of canonical and frequency sparse signals 
are, respectively, the least and most successful of the three instances. Signals that are sparse in the 
generic bases can be recovered nearly as well as frequency sparse signals. 

In the simulations with RBD matrices, signals that are sparse in the generic bases are recovered 
best, while the results for the canonical and Fourier bases are less satisfactory. These observations 
are in agreement with our findings in Section[3} We do point out, however, that for the case of RBD 
measurement matrices we are able to recover frequency sparse signals somewhat better than signals 
that are sparse in the canonical basis. We note that such difference is not reflected in our RIP 
bounds. While this performance difference could be due to a lack of explicit numerical constants 
in our results, it is more likely an artifact of our simulations: We are not directly confirming the 
RIP but rather testing the recovery of randomly generated test signals, and those few signals which 

4 In a nutshell, BP casts the signal recovery problem as a linear program, which can be efficiently solved. We refer 
the reader to [5] for more detail. 

5 In order to implement BP, we used YALL1, a package for solving l\ problems [341 13U] . 
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make it difficult to satisfy the RIP may be more pathological in the Fourier basis than in 
canonical basis. 




(a) Canonical basis 




0.4 0.6 
JM/JN 

(b) Fourier basis 
1 



0.4 



0.4 0.6 
JM/JN 



10.8 



0.6 



0.4 



0.2 



(c) Generic basis 

Figure 1: Simulation results for DBD matrices. Refer to Section [4] for details. 
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JM/JN 

(c) Generic basis 

Figure 2: Simulation results for RBD matrices. Refer to Section [4] for details. 

5 Proofs of Theorems [T] and [2] 
5.1 Preliminaries 

First, define the set of all S-sparse signals with unit norm as 

fls ■= [aeC N : |H|o < S, H| 2 = l} . (17) 
With this definition, we observe that the RIC for DBD matrices can be written as 

5 S = sup |||*-s(a)||l - l| , 

aen s 

where we leveraged the fact that 1 1 ck 1 1 2 = 1 implies 

E{||* • x(a)||l} = a*U*E{^*y} Ua = a*a = 1. 
The RIC for RBD matrices can be similarly written as 

S$ = sup I ||S • 2; (a) ||! — l| • 

aen s 
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Given 8 < 1 and under the conditions in Theorems [T] and [2j our objective is to show that 6s < 8 
for both DBD and RBD matrices. To achieve this goal, we require the following powerful result 
due to Krahmer et al.: 

Theorem 3. |16} Theorem 3.1] Let A C <C MxN be a set of matrices, and let e be a random vector 
whose entries are i.i.d., zero-mean, unit-variance random variables with sub-Gaussian norm r . Set 

d F (A) := sup ||j4||f> 
AgA 

d 2 (A) := sup H-AH2, 
AeA 



and 



E2 
E3 



72 (A, || • || 2 ) (72 (A II • H2) + d F (A)) + d F (A)d 2 (A), 

d 2 (A) (72 (A, || • \\ 2 )+d F (A)), 

4(A). 



Then, for t > 0, it holds that 



logPi sup ||Ae||^-E||Ae||^ > T Ei+t} < 7 



mm 



f_ t_ 
EVE, 



Without going into the details, we note that the 72-functional of A, 72 (A, \\ • \\ 2 ), is a geometrical 
property of A, i.e., the index set of the random process, and is widely used in the context of 
probability in Banach spaces |25l I18j . In particular, the following lemma gives an estimate of this 
quantity. 

Lemma 4. [23 H7] With A as defined above, it holds that 



72 (A 



| 2 )< 



log 2 (Af(A, 



, u)) dv. 



(18) 



Clearly, we need to express the problem of bounding the RIC of DBD and RBD matrices in 
a form that is amenable to the setting of Theorem |3j First, for DBD matrices, let us define 
X D jeC M * MN ,je[J}, as 



XD,j{ot) = X D j(a, U) :-- 



x*Aa) 



x j [a) 



x*Aa) 



(19) 



It can then be easily verified that 



\*-x{a)\\l = ll*i-^(«)lla 
ie[J] 

= Y, II^D,i(a)-vec($*)||| 



ie[J] 
ie[J] 



X D j(a) ■ £j 



\\A D (a) 



e\\l 



14 



where the linear map Ad : 17 c 



: MxJMN ig defined as 



A D (a) = A D (a,U) r- 



1 



X D ,2{a) 



Xd,j(oi) 



and entries of £j G M , j G [J], and e G are i.i.d. zero-mean, unit- variance random variables 

with sub-Gaussian norm r. The index set of the random process is Ad '■= {Ao(a) : a G ^5}. We 
have therefore completely expressed the DBD problem in the setting of Theorem [3j 
Next, for RBD matrices, we observe that 



x(a)\\ 2 



2 - £11* 

ie[J] 



$-X R (a)\\ 2 F 
X%(a)-**\\ 2 F 
1 



mg[M] 



l^fl(a) - £ II2 



(20) 



where in the last line we defined the linear map Ar : 
A R (a) = A R (a,U) : ' 



■<MxMN 



as 



X*(a) 



X*(a) 



(21) 



and the entries of e' m G Mr, m G [M], and e' G IR MAr are i.i.d. zero-mean, unit- variance random 
variables with sub-Gaussian norm of r. So, we have also managed to express the RBD problem in 
the setting of Theorem [3} The next two subsections are concerned with estimating the quantities 
involved in Theorem [3] for both the DBD and RBD problems. 

5.2 Calculating d 2 (AD), cI f (Ad), and ^(Ad, \\ • H2) 

We begin with defining the following norm on C , which will find extensive use in the analysis of 
DBD problem: 

\\a\\ AD := \\A D (a)\\ 2 (22) 
for a G C . We record a useful property of this norm below. 



Lemma 5. For every a G 



l N , it holds that 



ah. 



M 



(23) 
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Proof. Let Uj n , j E [J] and n £ [N], denote the ((j — 1)N + ra)th row of U. We then have that 

\\ot\\A n = \\A D (a)\\ 2 



\\A D {a)A D {a)U 
1 



max \\Xj o 
M je[J] JU 

max ||i/ja:||2 



M je[J] 
fit | , , | 

/iV 

V M j£[J],n£[N] 
_ V . ,1,1 



VM 

where the second to last line uses the Holder inequality and the last line follows from the definition 
of \x. On the other hand, one may also write that 

1 , 1 1 UTT . 1 1 

a An = / rnax \\Xj h < ; a? 2 = , \\Ua h = ; Q 2 < ; a h, 

\/Mje[J] \/M a/M \/M VM 



where we used the fact that U is an orthobasis. Overall, we arrive at 

HalUo < min ^ju, Vj) ||a||i = ~j= ' II Q 111) (24) 
V M VM 

as claimed. The equality above follows from the definition of ju. □ 

We continue with computing the quantities involved in Theorem [3] in the case of the DBD 
problem. First, we have that 

oIf(Ad) = sup ||Ad(q:)||.f = sup ||x(a)||2 = sup ||C/a||2 = sup ||a||2 = 1< (25) 

A D (a)eA D aett s a£Q s a£fl s 

The second to last equality holds because U is an orthonormal basis. Second, we have that 

^(Ad) = sup ||Ad(o:)||2 = sup [|a[Uxj < -7= sup ||a||i < (26) 
A D { a )eA D aen s VM «en s \M 

The first inequality above holds on account of Lemma [5} The second inequality above follows 
because [|a[|2 = 1 and [|a[|o < S when a G O5. It is only left to bound ^(Ad, || ■ lb)- According 
to Lemma |4j we have that 

roc 

12{A D , || • || 2 ) < / log^ (M (Ad, II • lb, v)) dv 
Jo 

log^ (Af(n s , II • Wad,*)) dv, 
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where the isometry between Ad (with metric || • H2) and (with metric || • \\a d ) implies the second 
line. This isometry, in turn, follows from (22) and the linearity of Ad(-). Consequently, 



12{A D , II • || 2 ) < J log^ ^Af 
< >/SjT° logs 



^5 
v/5' 



s 



■ \\A, 



dv 



,u) )dv 



(27) 



where the first line uses the second inequality in (47) and the last line follows from a change of 
variables in the integral. An estimate of the covering number involved in (27) can be found through 
the next result, which is proved in [El 



Lemma 6. Consider a norm || • \\a on C N that, for every a 6 C n , satisfies 



\a\\A = \\A(a)\\ 2 < 



M 



for some linear map A(-) : — > C N> with rank of at most M and some k > and integer N' . 
Then, for < v < k/\/m and M > 1, we have that 



A ,v <min SlogA + Slog 1 + 



2k 



vVM v 2 M 



■ log 2 N . (28) 



When v > k/vM, we have N (^, || • \\ A , v\=\ 



Qualitatively speaking, of the two bounds on the right hand of (28), the first is tighter when 
v is small while the second is more effective for larger values of v. Of course, || • satisfies the 

hypothesis of Lemma[6]with k = ju and the map Ad(-). Consequently, for < vq < fij\pM to be 
set later, we have that 



^5 



< 



"a 



"a 



Og5 ^J\f ( 



A D ,v] dv 



^5 

5' 



\A n ,v I I dv + 



I'll 



log a ^Af f 



S' 



A D ,v) dv 



Slog A + 



Slog 1 + 



2Jl 



dv + log N 



vVM , 



dv 



< zWSlogiV + z/o 



Slog 1 + 



2/2 



+ 



log A log 



"0 zvVM 



M 



,i/ vM, 



(29) 



The second line above follows from the second statement in Lemma [6j In the third line, different 
upper bounds from (28) are used to bound each summand. We benefited from (|49|) in the Toolbox 



to compute the logarithmic integral in the third line. With the choice of vq = jU/V SM, we obtain 
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that 



5' 



logs ( J\f | || . \\ A v ) ) dv 



M 



log(l + 2\/5) + log 5 log iV 
M ' VM 



< 



/i 



M 



; log 5 log N 



for 5 > 1. Now plugging back (30) into (27), we arrive at 

S 



72 01 



D> II • \\2) < fi\ ^logSlogN. 
M 



(30) 



(31) 



Before completing the analysis of the DBD problem, let us calculate the same quantities for the 
RBD case. 



5.3 Calculating g? 2 (Ar), (If(Ar), and ^(Ar, \\ ■ H2) 

Again, we first introduce the following norm on C , which will be useful in the analysis of the RBD 
problem: 

\\a\\ AR := \\A R (a)\\ 2 (32) 
for a G C . This norm has the following property. 



Lemma 7. For every a S C N , it holds that 



\<X\\A R < 



7 



ah. 



M 



Proof. Note that 



ay, 



\\A R (a)h 
1 



M 
1 



\\X R {a)\\ 2 

|| a (n)X R (e n )\\ 2 

ne[JV] 

<-j= ^ \a(n)\-\\X R {e~ n )h 

ne[N] 



< 



max \\X R (en)\\ 2 ■ ||a||i 

M n£[N] 



(33) 



• ||a||i. 

M 

The third line above uses the linearity of Xr(-). The fourth line follows from the triangle inequality 
and the fifth line is implied by the Holder inequality. We made use of the definition of 7 to get the 
last line. □ 
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We now continue with computing the quantities involved in Theorem [3] in the case of the RBD 
problem. First, we have that 

dF(An) = sup ||A?(o;)||f = sup ||X R (a)|| F = sup ||x(a)|| 2 = sup ||q|| 2 = 1. (34) 

A R (a)eA R aeUs aeUs aen s 

Second, we have that 



d 2 (A 



R 



7 / S 

sup | Ar(o)I 2 = sup \\a\\A R < —?= sup \a\\ < j* —, 

\ \l 



A R {a)eA R 



(35) 



where the first inequality follows from Lemma [7J The last quantity to estimate is 72 (Ar, || ■ lb)- 
As was the case in the previous subsection, we may write that 



f°° 1 

72(A R , || • || 2 ) < y/S / logs 
Jo 



UN 



s 



■ \\a r ,v I I dv. 



(36) 



Of course, || • \\a r satisfies the hypothesis of Lemma[6]with k = 7 and the map Ar(-). Therefore, 
following the same steps as in the previous subsection, we arrive at 



72 (A. 



R, 



I2) < 1\ 



log S log N. 



(37) 



5.4 Denouement 

We notice that the quantities c^CAd), ^2(>4d)> and 72 (-4d, || • H2) have the same bounds as their 
counterparts dp(Aji), d^iAji), and 72 (A^, || • H2) except for the type of the coherence factor involved. 
Therefore, it suffices to focus on one — the same result holds for the other with its corresponding 
coherence factor. Given 5 < 1, assume that 



M > T S~ 2 J1 2 ■ S log 2 S log 2 N. 



(38) 



Equipped with the estimates in Section 5.2 i.e., (25), (26), and (31), we now compute E\ in 
Theorem [3j 



Ei := 72 (Az?, || • || 2 ) (72 (Ad, || • h) + d F {A D )) + d F {A D )d 2 {A D ) 

/L/^logSlogiV ( /L/jllog51ogjV" + l ] +JiJ-L 
V M \ V M / V M 



< 



<r 6(5+1) + 



< 25 + 



logS 1 log N 



log S log N 
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where we assumed that S > 1 and used the hypothesis that S < 1 in the last line. Doing the same 
to E2, we obtain that 

E 2 := d 2 {A D ){ri2{A D ,\\-\\2) + d F {A D )) 
< ( juJ — logSlogiV + l ) 



s 



< 



M \ V AT 

log 5 log AT 
5 

log 5 log iV ' 



Similarly for £3, we write that 



E 3 :=4(A D )<^£< T 



M log 2 Slog 2 iV 

Plugging the above estimates of E\, E2, and £"3 into the tail bound in Theorem [3j we obtain that 

logpJ sup |||*-z(a)||| - l| > T 5 + t } < r - mhiU-H 2 log 2 Slog 2 N,5-H\og 2 Slog 2 N) . 
{a&n s J V / 

Substituting t = 5, we arrive at 

logpJ sup |||*.a;(a)||l-l| > T s\ < T -log 2 Slog 2 N, 
[aen s ) 

assuming that S > 1. After absorbing the factor depending on r into (a redefined) 6, we finally 
arrive at 

logpJ sup .x(a)||!-l| >4 < T -log 2 Slog 2 N, 
[aen s J 

which completes the proof of Theorem [Tj Replacing Jl with 7 and repeating this argument concludes 
the proof of Theorem [2] 

6 Conclusion 

In this paper, we studied two important classes of structured random matrices, namely DBD and 
RBD matrices. Our main results state that matrices with block diagonal constructions can indeed 
satisfy the RIP but that the requisite number of measurements depends on certain properties of the 
sparsifying basis. These properties were detailed and carefully interpreted in the paper. In the best 
case, DBD and RBD matrices perform nearly as well as the dense i.i.d. random matrices generally 
used in CS despite having many fewer nonzero entries. Moreover, we have shown that random block 
diagonal matrices are intimately related to the random convolution and random Toeplitz matrices 
considered in the literature. 
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Our findings lead us to conclude that structured random matrices can be useful in sensing archi- 
tectures as long as the statistics of the data are well-matched with the structure of the measurement 
matrix. While this intuition is similar to other results on structured measurement matrices, our 
results on block diagonal matrix constructions are novel in extending this intuition to matrices 
with (potentially) many entries that are zero. The approach required to reach our results also leads 
us to conclude that future progress in the field of probability in Banach spaces is likely to play a 
significant role in establishing optimal performance guarantees for other structured measurement 
systems. This may be especially true given the improved performance we were able to achieve even 
over other bounding techniques that require sophisticated mathematical machinery. Finally, while 
we remain uncertain about the necessity of the poly-logarithmic factors in the final measurement 
rates (which may be a proof artifact), the simulation results display the dependence on the sparsity 
basis (through the coherence) present in our main results. Despite the fact that the simulation 
results address average case behavior (as opposed to worst-case behavior captured by the RIP), 
these results also lead to the conclusion that this dependence on coherence is qualitatively true and 
not simply a proof artifact. 

There are several directions that can be explored in the future. First, as we have discussed in the 
introduction, block diagonal matrices are useful for modeling distributed measurement systems. It 
may therefore be of interest to specialize our results to some particular distributed systems. Take 
for example MIMO radar where multiple independent transmitters and receivers are arbitrarily 
distributed over an area of interest to sense targets in a scene. Data from the receivers with 
potentially high data rates needs to be sent to a central processor and to be coherently processed 
to achieve a maximum processing gain. The block diagonal structure studied here is potentially 
useful to analyze the possibility of compressing the data at the individual receivers before sending it 
to the central processor. For another example, block diagonal structure has also been exploited in 
observability studies of dynamical systems [29]. Our understanding of this and similar problems [21 
El [9] may be enhanced using the results in this paper. 
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A Toolbox 



This section collects a few general results that are used throughout the paper (mainly without 
proofs, for the benefit of the space). 

Schatten norms possess the following useful property that mirrors Euclidean norms: 

Rank(A)*4 \\A\\ Sq < \\A\\ Sp < \\A\\ Sq , (39) 

for a matrix A, when 1 < q < p. The following version of the Holder inequality for matrices is used 
in this paper. For any pair of matrices A, B (such that AB exists), the following holds j^] 

\\AB\\ F < \\A\\ 2 \\B\\ F . (40) 

For a random variable Z and 1 < p < q, the following holds |21| Page 30]: 

E p \Z\<E q \Z\. (41) 

Also, || C • Z\\^ 2 = |C| 11^11^2 an d' according to [27, Lemma 5.9], the following holds for a finite 
sequence of zero-mean independent random variables {Zj}: 

iiE^ii^£ii^4- ( 42 ) 

j 3 

Throughout this section, let g G denote a vector whose entries are i.i.d. zero-mean unit 
variance sub-Gaussian random variables. For convenience, set K := max ne ^ ||5(w)||^ 2 . For t > 
and n £ [N], the following holds by the definition of the sub-Gaussian norm |27j : 

logP{| g (n)|>t}<- . (43) 

Also, the following holds when N > 1, |271 Lemma 6.6]: 

Vn9\\l, x = (E max g 2 {n)) 1 ' 2 < K^N. (44) 

ne[Af] 

Furthermore, for t < 1, the following inequality is from [27, Corollary 5.17] and provides a lower 
bound on \\g\\2'- 

t 2 t 



logPjllslh < (1 - t)VN] < - min(-^, j^)N. (45) 

Suppose that the entries of G G 'R NxJ are Gaussian random variables with zero-mean and unit 
variance. The next inequality provides an upper bound on the spectral norm of G, which directly 
follows from Corollary 5.35 in [27] . When J < N, and for t > 0, the following holds: 

logP{||G|| 2 > (l + t)VN + VJ\ < -t 2 N/2. (46) 

The next result essentially bounds the moments of a sum of independent random variables with 
those of a Rademacher sequence. The proof of this result (and the next one) uses the symmetrization 



Let {bj} denote the columns of B. Then ||AB||! = £\ \\Abj\\l < Plllllfejlll = 
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technique, which has the following argument at its heart. If Z is a random variable taking values 
in a Banach space x> then we can define its symmetrized version Y = Z — Z' , where Z' is an 
independent copy of Z. As suggested by the name, the distribution of Y is symmetric about the 
origin. In addition, the distributions of Y and £Y are the same, where £ is a standard Bernoulli 
random variable that takes ±1 with equal probability. 

Lemma 8. [2IJ Lemma 6.7] Let {Zj} be a finite sequence of independent random variables in the 
Banach space (%, || • |L). Then, the following holds: 



IIX' 



The next lemma links the tail bounds of a sum of independent random variables and its sym- 
metrized version. We remark that this result directly follows by applying equation 6.1 in |18| to 

EjZj-ez,. 

Lemma 9. Let {Zj} be defined as above, and let {Z'A denote an independent copy of {Zj}. The 
following holds for t > 0: 

p|llE^- E ^Hx> 2IE llE^- IE ^llx + *j <2p||| J>(^--Zj)|| x >t 

In our proofs, we also require a (weak version of) the Khintchine inequality for operator norms 
that we state next. 

Lemma 10. Lf {A{}, I £ [L], is a sequence of matrices of the same dimension and rank of at most 
K > 1, then the following holds. 



E 



E^< 

le[L] 



1/2 



<yioiK( ep 

de[L] 



i\\l 



Proof. From [211 Theorem 6.14] and for every 2 < p < oo, we recall that 









( 














< VP - max 




(£ JkAf)* 










ie[L] 


Sp 


\ 




Sp 


ie[L] 





The spectral norm is a special case of the Schatten norm with p = oo. Therefore, the inequality 
above does not directly apply to our problem. As such, we need a more detailed argument here, 



which follows the approach of |27j . From (39), with p = oo and q = log J, recall that 



e ^IMs < \\A\\ 2 < \\A\\ SlosJ . 



This equivalence, in combination with the fact that E p is increasing in p, i.e., (41), allows us to 
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write 



E 


5>4 


< E 


£&4 




ie[L] 


2 


ie[i] 



'log ./ 



< E logJ 


£&4 












/ 


S'log J 






< i/log J max 




(£44)* 




(£4*4)* 








16[L] 


Slog J 


ie[£] 



'log J, 



< ey/log J max 




(£44) § 


5 


(£44)5 






J6[£] 




2 












l 

2 








= eydog J max 








£44 








16 [L] 


2 


16 [X] 





< ey^gJmax [ ( J] ||A,||2)*,(£ Mills)*) 
= e V / b^7-(£ II4II") 1 , 



as claimed. We assumed above that J > e to produce the first line, and the second to the last 
line above uses the triangle inequality and the fact that ||AA*[|2 = ||4ll ^ or an y ma t r i x A. We 
remark that had we stopped at the fifth line above, we would have ended with the stronger original 
non-commutative Khintchine inequality for the spectral norm. However, the weaker bound given 



in the last line suffices for our purposes in this paper and completes the proof of Lemma 10 



□ 

We also list two trivial identities regarding covering numbers, which hold for every set 5, norm 
||, and r, a > 0: 



M{S,a\\ ■ \\,r)=M(S,\\ • \\,r/a) 
N{aS, || • \\,r)=M(S,\\ ■ \\,r/a). 

For reference, we also provide an estimation for two integrals we encounter in the analysis: 

logf 1 + - ] du<alog( 1 + - 



log 1 + - du<aJlog 1 + 



(47) 

(48) 
(49) 



Both inequalities hold when a <b. 
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B Proof of Lemma [T] 



Equivalently, R can be constructed as follows. Let {rjj}, n £ [N], denote the columns of R. The 
first column, r±, is chosen from the uniform distribution on the unit sphere in K^. For every 
n G [iV]\{l}, is chosen from the uniform distribution on the unit sphere in the orthogonal 
complement of the span of the first n — 1 columns. 

Let g 6 M. N denote a standard Gaussian vector, that is a vector whose entries are i.i.d. zero-mean 
Gaussian random variables with unit variance. Since r\ is drawn from the uniform distribution on 
the unit sphere in M. N , the entries of r\ have the same distribution as those in fl'/||fi'||2- Since the 
distribution of R remains unchanged under permutation of its rows, every column of R has the 
same (marginal) distribution as fl , /||fl , ||2- This, alongside with the union bound, allows us to write 
the following for any t > 0: 



P< /x(-R) > tylogN > = P< max || 

T*n 1 1 max 

> tyiogN/VN 

\ne[N] 



< N- maxP<j IHUax > ty'logN/VN 

n&[N] 



\q(n)\ /logiV 
N P{ max > t\ — %^ 



n€[N] \\9\\2 



N 



N 



\9\\2 



| g (i)l > */2- yiogiv 



iV 2 • P{ 

\9h (l-l/2)ViV 
< N 2 • P|b(l)| > t/2 • Tbg^j + iV 2 • P{ \\g\\ 2 < (1 - 1/2)71} 



_£2_, 2 



<iV 2 e 



2„ 4|| 9 (l) 



t z log N 



<P 2 



+ N 2 e 



- Cz min 



< iV e 



%^t 2 log AT 

2„ 4|| 9 (1)|| 2 ,. 6 



'^2 



+ N 2 e 



where we used (43) and (45) to bound the failure probability. The last line holds because ||p(l)||^ 2 
y/2/ir [j] If we take N > t 2 log N, we obtain that 



_C2 ,2 



P\^(R)>t^logN}<N 



We arrive at the advocated result when t > 1: 



4||fl(l)l 



2- .„ ?*., A t 2 



*2 +N 



4||ff(l)l 



V>2 



(50) 



P<^ n(R) > tJlog N } < iV"* + AT - * = 2N~ 



(51) 



7 This is easily verified using the moments of the Gaussian distribution. 
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C Proof of Lemma [2] 



We use here the construction of R laid down in the beginning of [B} As pointed out in the proof 
of Lemma [TJ the columns of R are dependent but identically distributed as <7/||g||2, where g was 
defined there. Now, for every t, we can write 



P{ max \\X(R,e n )\\ 2 
I He [AT] 



1 1 + t 

+ 



y/J 



< N ■ max P<> \\X(R, ez)\\ 2 > -4= + "' + ' 



n£[iV] 



1 



AT.p ||^(12,ei)|| 2 > -= + 



iV 

1 + t 



_ , . (52) 

The second line uses the union bound and the last line holds due to the identical distribution of 
the columns of R. It remains to find an upper bound for the probability in the last line above. 
Recall that r\ has the same distribution as <?/||<7||2) and thus X(R, ei) has the same distribution 
as G/||G||ir, where G G C NxJ is formed by reshaping g. Therefore, \\X (R, ei) [| 2 has the same 
distribution as ||G|| 2 / ||G||^. For fixed t < 1, the following convenient inequality holds j^] 



1_ 1 + t > (l + t/3)VN + VJ 

VJ ~ r 



(l-t/3)VN 



N 



(53) 



Now, we can write that 

p{\\X(R, ei ) 



^ 1 1 

I2 > — + 



t 



G\ 




> 



N ' Vj 



N yfj 



\\Gh [1 + DVn + VJ 



> 



> 1 



<e~^ N + e 



<e is 1 " + e 




< 1 



-C3 min 



t 2 N 



where the second line uses (53). The fourth line uses the inequalities (45) and (46) and the fact 
that ||G||i? = \\g\\2- The second to last line follows because ||(7(l)||,/, 2 = y/2/n and t < 1. The above 
upper bound in combination with ( 52 ) leads us to the following conclusion: 

-C 4 t 2 N 



P< i(R) > 1 + 



+ t><Ne 



This inequality easily follows from the fact that 1 + t > (1 + 1/3) (1 — t/3) 1 holds for every t < 1. 
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Figure 3: A visual illustration of the transformation from T" to T for P 
explanation in[D| 



J 



5. See the 



We complete the proof of Lemma [2] by taking N > 2C 4 1 t 2 logN. 

D Proof of Lemma [3] 

Let T' := Fj <%> Ip G C PJxPJ , where (g) stands for the Kronecker product. Here, Fj and Ip are the 
Fourier and canonical orthobases of size, respectively, J x J and P x P. We consider the natural 
partitioning of X" into J row-submatrices of size P x PJ, denoted by Tj, j G [J]. Now, for every 
j G [J], cyclically shift the rows of T'- by j — 1 times upward and create Tj G C PxPJ . Then form 
the matrix T G <C PJxPJ by replacing every Tj with Tj. The transformation of T' to T is visualized 
in Figure [3} 

Using the properties of the Kronecker product, it is easily verified that T' is an orthobasis for 
C PJ . Due to its structure, we also observe that the nonzero entries of the pith and p2th columns of 
T do not overlap when {p\ —P2) modP 7^ 0, and so these columns are orthogonal. Otherwise, when 
(pi — P2) mod-P = 0, we need a more subtle argument. Under this condition, the inner product of 
the pith and p2th columns of T" equals the inner product of the [pi/P]th and [p2/-P]th columns 
of Fj and is indeed zero. The inner product of the pith and P2th columns remains zero under the 
transformation of T' to T since this transformation only amounts to a permutation in the rows of 
T' . Therefore, T is an orthobasis for C PJ . As for computing j(T), the structure of T guarantees 
that, for each j, every column and row of X(ej,T) has only one nonzero entry with the magnitude 
of 1/VJ. Therefore, -y(T) = 1. 

Finally, it can be easily verified that x/yfj = Tx e , where x e = [x T , 0, 0, • • • , 0] T G C PJ , and by 
(16), Tx = HTx e . If x is S-sparse, so is x e . This completes the proof of Lemma [3l 



E Proof of Lemma [6] 



In what follows, we let denote 



The arguments used in this section are largely adapted from 
the unit ball with respect to the norm || • \\a in C . Also, and B2 , respectively, denote the 
unit ^i-ball and ^2-ball in C^. For T C [N], we let B T A denote the unit ball in the #T-dimensional 
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subspace of spanned by {e^}, n 6 T. We define Bj and B\ similarly. 

The first thing to notice is that when a G ^s/v^, then ||a||i < 1. From the hypothesis of the 
lemma, we then have that 

(54) 



\(x\\a < 



M 



This implies that for every support T C [N] with #T = S, we have 



&2 r- K kT 



'S - VM 

On the other hand, £ls/V^S can be equivalently represented as 



(55) 



n_s_ 



u 

#T=S 



(56) 



Together, (55) and (56) imply that 



c 



5 



U -k-i 



#T=S 



M 



We also record that 



which dictates that 



K 

/m 



X > 



Af(n s /Vs,\\-\\A,v) = i 



(57) 



(58) 



(59) 



if v > re/V M. This proves the second statement in Lemma ml Otherwise, if v < k/v M, we 



continue with the rest of the argument. In light of (57), we have that 



A-i%|HU.« 



U -4= 



• II • IU,f 



A/ 



M 



•M [B\, II • Ia^k-Vm 



(60) 



where the last line uses the second inequality in (47). An estimate for the covering number in the 
last line above is available for r > 0, nameljfl 



M(B T A ,\\-\U,r)<(l + 2/rf* T . 



(61) 



This is proved similar to Lemma 5.2 in [27], after exchanging the Euclidean metric with || ■ ||a and accounting 
for the complex vector space. 
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Plugging the bound above into (60), we arrive at 



J II \\A-> 



1 + 



2 k 



25 



< 



z/VM. 



eN 

IT 



i + 



2k 



25 



(62) 



The last inequality holds because < (en/m) m for any pair of integers m < n. When TV > 1, 
(62) implies that 



loo A' ( || . \\ x , u ) < Slog f ^ 1 + 251og f 1 + 2 '' V 



5' 



S 



< Sloe N + S log 1 + 



i/VM . 



2 k 



i/VI, 



(63) 



The bound above is less effective for small values of v. To seek a second bound on the covering 
number, we begin from the containment 



Vs 



c B[ 



N 



which immediately dictates that 



A'l -^,||-m^] <M(B?,\\-\\ A ,v 



(64) 



(65) 



In order to compute the covering number on the right hand side of (65 ), we use the following result, 
which is proved in[F} 

Lemma 11. Let B 1 ^ denote the l\-ball in R , and consider the norm \\ ■ \\a on C N , which satisfies 

the hypothesis of Lemma^ Naturally, \\ ■ \\a induces a norm on C (which is represented 
with the same notation for convenience). For v > and M > 1, it holds that 



log AT (B #J -\\ A ,u < 



; 2 M 



log 2 N. 



Now consider an arbitrary /3 G B^ , and note that Re[/3], Im[/3] G B^. Let 



Ci := C [B^, || • \\a,v/2 

denote a minimal (i^/2)-cover for B 1 ^ with respect to the metric || • \\a- Therefore, there exist 
p±,P2 G Ci such that ||Re[/3] — J>i||x> P m [/3] — P2\\x < vj2. It follows by the triangle inequality that 

11/3 - (Pi +ip2)IU = ll(Re[/3] + i(Im[/3] -p 2 )|U 

< ||Re[/3] -pi|U + ||Im[/3] -p 2 |U 

< ZA 
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Therefore, {p\ + \p2 ■ Pi, P2 € Ci} is a cover for , and clearly 



M(B^\\-\\ A ,u)<(Ar(B lff ,\\'\\ A ,u/2) 



It now follows from ( 65 ) , Lemma 1 1 , and the inequality above that 



< 



v 2 M 



\og A N. 



Combining (63) and (66), we finally arrive at 



loo A' ( ^ || . || A , u ) < min f siog AT + Slog [ 1 + 2/V 



which holds when M > 1. This completes the proof of Lemma rol 



\oefN 



(66) 



F Proof of Lemma [TT1 

Consider an arbitrary /3 £ ^\n- Also consider a random vector Z that takes a value in {0} U 

{sgn(/3(n)) • e^}, n € [iV]. It takes sgn(/3(ra)) • with probability of |/3(ra)| and zero otherwise. 
Clearly, KZ = (3. Now, we wish to approximate j3 with the average of L independent copies of Z, 
denoted by {Z{\, I £ [L]. The expected approximation error, measured in || • \\a, would be 



E 



ie[L] 



Since the argument of the norm is zero-mean, we can use the symmetrization technique by invoking 
Lemma [8] from the Toolbox to obtain that 



E 



IG[L] 



L 



-E 



ie[L] 



< -E 
L 



ML] 



(67) 



where {£;}, I € [L], is a Rademacher sequence. According to Lemma[6j || • \\a = ||j4(-)||2 and A(-) 
is a linear map. Therefore 



E 



ie[L] 



E 



ie[L] 



Also, from the hypothesis of Lemmaral Rank (A(Z/)) < M for every I. We now invoke a Khintchine- 



type inequality for the operator norm (stated in Lemma 10 from the Toolbox), which allows us to 
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continue our argument as 



E 


E tiMZi) 


= E Z E ? 


E &^( z ') 




ie[L] 


2 


«e[L] 



1/2 



< \/lo E M-E' 



E 

v/6[L] 



/;ii2 



1/2 



< A/loeM 



E E n^)ii2 

v ;e[L] 



'1«*.V. I E E ifti -11^)111 

v /e[L] He[iV] 

J 2 

< VlogM • (L-maxP( e?i )|p 

new 



L log M • max (je^m 



1/2 



M 



logM, 



(68) 



where, conditioned on {Z[}, I G [L], the Khintchine inequality was used to produce the second line. 
The third line follows from the Jensen inequality. The fifth line uses the assumption that /? G B 1 ^. 
The last line follows from the hypothesis of Lemma |6j Using the above inequality in combination 
with (67) yields 



E 



P 



ML] 



< 



'logM 
LM ' 



x 



To keep the average no larger than it suffices to take 

,_2 



L > 



j 2 M 



log M. 



With this choice of L, there exists a linear combination of independent copies of Z that falls within 
a v distance of f3. In other words, we have shown that for an arbitrary (3 G B 1 there exists 

an average of L elements of {0} U {±e^} that is of distance at most v from f3. There are 2iV + 1 
elements in the aforementioned set, and so there are (2N + 1) L possibilities for the average. We 
therefore conclude that 

C 5 \ogM-K 2 /u 2 M 



N \ B 1 N> II ' IU' V ) ^ 2/V + l 
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or 



log AH 23^,11 • \\ A ,u \ < log M log! 22V + 1 



< 



< 



v 2 M 

2 

u 2 M 



■ log M log 2V 



• log 2 2V 



when 2V > 1. This completes the proof of Lemma 
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